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STRATIFICATION AND ENUMERATION OF BOOLEAN FUNCTIONS BY CANALIZING 

DEPTH 

QIJUN HE AND MATTHEW MACAULEY 


ABSTRACT. Boolean network models have gained popularity in computational systems biology over the last 
dozen years. Many of these networks use canalizing Boolean functions, which has led to increased interest 
in the study of these functions. The canalizing depth of a function describes how many canalizing variables 
can be recursively “picked off”, until a non-canalizing function remains. In this paper, we show how every 
Boolean function has a unique algebraic form involving extended monomial layers and a well-defined core 
polynomial. This generalizes recent work on the algebraic structure of nested canalizing functions, and it 
yields a stratification of all Boolean functions by their canalizing depth. As a result, we obtain closed formulas 
for the number of n-variable Boolean functions with depth k, which simultaneously generalizes enumeration 
formulas for canalizing, and nested canalizing functions. 


1. Introduction 


Boolean networks were invented in 1969 by S. Kauffman, who proposed the m as models of gene regu¬ 
latory networks I Kau69 1. They were slow to catch on, but since a seminal paper l lAOQ3 1 from 2003, where 
Albert and Othmer modeled the segment polarity gene in the fruit fly Drosophila melanogaster, they have 
emerged as popular models for a variety of biological networks. Random Boolean networks (RBNs) have 
been studied throughout the years, with various restrictions on the functions or wiring diagrams to better 
reflect salient properties of actual biological networks. For example, without such restrictions, RBNs dis¬ 
play chaotic behavior in the sense thaUhey are very sensitive to small perturbations. In contrast, biological 
systems must be robustly designed [LLL+04] in order to withstand a variety of internal (e.g., mutation or 
gene knockout) and external (e.g., environmental) changes. In 1942, the geneticist H. Waddington defined 
the concept of canalization to study this robustness. Over 30 years later in IKau7411 . Kauffman introduced 
the notion of canalizing Boolean functions in order to accurately reflect the behavior of biological systems 
in the setting of Boolean network models. Another thirty years after that, Kauffman and collaborators fur¬ 
ther expanded the canalization concept and introduced the class of nested canalizing functions IIKPST03I1 . 
which can be thought of as functions that are fully “recursively canalizing.” 

In the last decade, canalizing functions have been extensively studied by researchers in the fields of 
mathematics, biology, physics, computer science, and electrical engineering. For example, Shmulevich 
and Kauffman showed that canalizing functions have lower activities and sensitivities than random Boolean 
functions, and this causes Boolean network models using these functions to be more stable; see ISK04 ] and 
IK PST04I 1. More work on the dynamical stability of canalizing Boolean networks was done in IMA05I1 and 
in IKH07I1, where the authors explored the relationship between the proportion of canalizing functions in 
a network, and whether it lies in the ordered or chaotic dynamical regime, or near the so-called critical 
threshold. The evolution of canalizing Boolean networks was studied in [SD07L Fourier analysis has 
shown that canalizing Boolean networks maximize mutual information [KKBS14]. An exact formula 
was derived for the number of Boolean canalizing functions in I JSK04tl . Canalizing functions have been 
generalized from Boolean to over general finite fields in [ML 12]. 

Nested canalizing functions (NCFs) have also gained significant attention. In I Pei ldh and [KLAL14], 
the authors study the phase diagram of Boolean networks with NCFs. A recursive formula for the number 
of NCFs was derived in ijRL07[[ . where they were shown to be what the electrical engineering community 
calls unate cascade functions iBB78ll . NCFs have been studied algebraically through the lens of toric 
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varieties [JL07], and in |LAM + 13|, where the authors obtained a unique algebraic form by writing an NCF 
in extended monomial layers. This allowed the authors to enumerate the number of NCFs. It also provided 
the tools for the development of an algorithm in | HJ12h to reverse-engineering a nested canalizing Boolean 
network from partial data. In [LDM12], the authors generalized the notion of both canalizing and nested 
canalizing functions by introducing the class of partially nested canalizing functions. Loosely speaking, 
these are the functions that are “somewhat recursively canalizing.” The dynamics of Boolean networks 
built with these functions has been studied in [LDM12] and jjM13ll . 

In this article, we carry out a detailed mathematical study on canalization of Boolean functions. Instead 
of thinking of partially (or fully) nested canalizing functions as a subclass of Boolean functions, we con¬ 
sider canalization as a property of all Boolean functions. We modify the notion of canalizing depth from 
I LDM12ll to quantify the degree to which a function exhibits a recursive canalizing structure. From here, 
we show that every Boolean function h as a unique algebraic form using extended monomial layers, gener¬ 
alizing what was done for NCFs in |LAM~ 13 |. Once one “peels off” these layers, a unique non-canalizing 
core polynomial remains. This gives a well-defined stratification of all Boolean functions by canalizing 
depth and monomial layers, which includes the canalizing, non-canalizing, and NCFs as special cases. We 
say that a function is k-canalizing if it has canalizing depth at least k. Our stratification allows us to de¬ 
rive exact formulas for the number the fc-canalizing functions on n variables. The special cases of k = 1 
and k = n yield the enumeration results of canalizing, and nested canalizing functions from JJSK04j] and 
|LAM + 13|, respectively. 

This paper is organized as follows. After introducing necessary preliminaries in Section [2] we define 
/.•-canalizing functions, canalizing depth and core functions in Section [3] Next, we characterize Boolean 
functions by a unique polynomial form in Section 0] and use this to stratify all Boolean functions by ex¬ 
tended monomial layers and their core polynomials, which are slighly different from the aforementioned 
core functions. In Section[5| we use this structure to derive exact enumeration formulas for the number of 
functions with a fixed canalizing depth. Finally, we end in Section [ 6 ] with some concluding remarks and 
directions of current and future research. 


2. Canalizing and nested canalizing functions 


To make this paper self-contained we will restate some well-known definitions; see, e.g., [KPST03fl. 
This is also needed because there are slight variations in certain definitions throughout the literature. Let 
F 2 = {0,1} be the binary field, and let /: F/f —>• F 2 be an ?z-variable Boolean function. 


Definition 2.1. A Boolean function /( Xi, ..., x n ) is essential in the variable x t if there exists a sequence 

ai, ..., a,;_i, dj+i, ..., a n G F 2 such that 

/(d 17 ■ • ■ > i, 0 , Ui-j-i,..., a n ) f (a. 1 ,..., Gjj, — 1 ,1, 1 , ■ ■ -, a n ). 


In this case, we say that Xi is an essential variable of /. Variables that are non-essential are fictitious. 


S. Kauffman defined canalizing Boolean functions in |[Kau74tl to capture the general stability of gene 
regulatory networks. In that paper, a Boolean function / is canalizing in variable x t , with canalizing input 
a and canalized output b, if, whenevertakes on the value a, the output of / is b , regardless of the inputs 
of other variables. As a consequence, constant functions are trivially canalizing. We will soon see why it is 
more mathematically natural to exclude these functions, among others. This is done by the following small 
adjustment to the original definition that does not change the overall idea. 


Definition 2.2. A Boolean function /: F£ —>• F 2 is canalizing if there exists a variable a;,, a Boolean 
function g(x 1 ,..., Xi-i, Xi +±,..., x n ), and a, b £ F 2 such that 


f(xi, ■ ■■,x n ) 


b Xi = a, 
gt£b a. 


In this case, Xi is a canalizing variable, the input a is the canalizing input, and the output value b when 
Xi = a is the corresponding canalized output. 


The only difference of our definition is the added restriction that g can not be the constant function b. 
In other words, we require a canalizing function to be essential in its canalizing variable. The original 
definition was motivated by the stability of canalizing functions while our definition tries to capture the 
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dominance of the canalizing variable. At first glance, our additional restriction might seem artificial or in¬ 
significant. However, it is unequivocally more natural when considering the algebraic structure of Boolean 
functions, which is at the heart of the stratification derived in this paper. 

In Definition 12.21 when the canalizing variable does not receive its canalizing input a, the function g 
obtained by plugging in au = a can be an arbitrary Boolean function. To better model a dynamically 
stable network, in [ KPS T031 Kauffman proposed that in this case, there should be another variable Xj that 
is canalizing for a particular input, and so on. This leads to the following definition, where a is a total 
ordering, or permutation, of [n] := {1,..., n}. We write this as cr = <r(l), cr(2),..., a(n), and say that 
u £ & n , the symmetric group on [n]. 


Definition 2.3. A Boolean function /: FJ —>• F 2 is nested canalizing with respect to the permutation 
<7 £ & n , inputs cii and outputs b-i, for * = 1,2,..., n, if it can be represented in the form: 


(1) 


f(x 1 ,• • • ,x„) 


bl *t'cr(l) Ui, 

^2 *Tct(1) ^ — H2; 

l>3 X cr(l) ,X a (2) ¥= a2, x a(3) = «3, 


bn x a-( 1 ) ttl? • • ■ 5 x cr(n—l ) 7^ ^n— 1 ; x cr(n) tl n , 

, ^71 x a(l) ^ til j • • • j x a(n—l) 7^ tin —1; X a (n) 7^ tin- 


The idea of nested canalizing is in that some sense, it is “recursively canalizing” for exactly n steps. As 
an analogy, one can consider a nested canalizing function as an onion. We can peel off variables one at a 
time by not taking the canalizing input of each variable (i.e., by plugging in x 7 ; = a~). Before we peel off 
the ‘inner’ variables, we need to peel off the ‘outer’ variables first. In the end, we are left with the constant 
function b n . We will return to this onion analogy several times throughout this paper to highlight our main 
ideas. 


Remark 2.4. Since b n / b n , a nested canalizing function is essential in all n variables. 

If a Boolean function is nested canalizing, then at least one (of all n\) ordering of the variables yields 
an equation in the form of Eq. ©. Note that such variable orderings are not unique, and the number of 
such orderings depends on the function /. For example, we can write the function fi(x, y, z) = xyz as in 
Eq. © using any of the 6 orderings of the variables {s, y , z}. In contrast, for f 2 ^x, y , z) — x(yz + 1), only 
2 orderings would work, namely (x, y , z ) and (x, z , y). 


3. A:-Canalizing Functions 


Nested canalizing functions have a very restrictive structure and become increasingly sparse as the 
number of input variables increases [JRL07]. In a real network model, it is often the case that not all 
variables exhibit nested canalizing behavior. Moreover, the first several canalizing variables play more 
central roles than the remaining variables. Thus, it is natural to consider functions that are canalizing, 
but not nested canalization. For example, one function in the segment polarity gene in by Albert and 
Othmer’s seminal paper |AO03ll is canalizing but not nested canalizing. For another example, one can look 
at the lactose (lac) operon, which regulates the transport and metabolism of lactose in Escherichia coli. In 
[RH13|], a simple Boolean network model of the lac operon was proposed, where the regulatory function 
for lactose was 

f L {t + 1) = Gl A [(L EE)\J (L e A E)}. 


In a sentence, this means “internal lactose (L) will be present the following timestep if there is no external 
glucose (G e ), and at least one of the following holds: 

■ there already is internal lactose present, but the enzyme /3-galactosidase (E) that breaks it down is 
absent; 

■ there is external lactose ( L e ) available and the lac permease transporter protein (also represented 
by E since it is transcribed by the same gene) is present. 
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The variable G e (though sometimes considered a parameter) is canalizing because it acts as a “shut-down” 
switch: if G e = 1, then f L = 0 regardless of the other variables. In other words, we can write this as 


f L (G e ,L e , L, E) 


0 G e = 1, 

(L A E) V (L e A E) G e =f 0. 


The function g = (L e A E)V (L AE) is not canalizing, and so the 5-variable function /V, is canalizing but 
not nested canalizing. In the framework that we are about to define, this function has canalizing depth 1. 

Due to both theoretical and practical reasons, a relaxation of the nested canalizing structure is often 
necessary. This was done in ILDM12 1. where there authors defined partially nested canalizing functions, 
and then distinguished between the “active depth” and “full depth” of a function. Our definition of fc- 
canalizing functions is similar to what it means in their paper to be “partially nested canalizing of active 
depth at least k.” As before, the small differences are motivated by the desire to have a natural unique 
algebraic form. 


Definition 3.1. A Boolean function f(x i,..., x n ) is k-canalizing, where 0 < k < n, with respect to the 
permutation a £ & n , inputs a,, and outputs b ,, for 1 < i < k, if 


( 2 ) 


f{x i, ...,x n ) 



b 2 

b 3 


•t'cr(l) 0 \, 

Xa(l) 7^ ttl,^(j(2) = 0>2t 

Xa(l) 7^ Ol, X a (2) 7^ tt2:^cr(3) = ^3; 


bk ■t'cr(l) ^ &1) ■ • • ) X<j(k— 1) Ok — 1 , X a (k) — Ok, 

^ fr/c ■t'cr(l) ^ OX a [k— 1) 7^ Ok— 1, X a (k) 7 ^ Ok- 


where g = g(x a (k+i), ■ ■ ■, x a (n)) is a Boolean function on n — k variables. When g is not a canalizing 
function, the integer k is the canalizing depth of /. Furthermore, if g is not a constant function, then we 
call it a core function of /, denoted by fc- 


As with canalizing and nested canalizing functions, the g ^ bk condition ensures that / is essential in 
the final variable, x a (k)- 

Remark 3.2. Since g ^ bk, a function / that is fc-canalizing with respect to a £ ©„, inputs a, and outputs 
bi is essential in each x a (.q for i = 1,..., k. 

The representation of a fc-canalizing function / in the form of Eq. ([2j. even when fc is the canalizing 
depth, is generally not unique since it depends on the variable ordering. However, we will prove that several 
key properties, such as the canalizing depth and core function fc = g (if there is one), are independent of 
representation. It is worth noting that if g is constant, then g need not be unique, i.e., both g = 0 and g = 1 
can arise. This is why we do not allow constant core functions. The following observation is elementary. 


Remark 3.3. If / is fc-canalizing with respect to a £ & n , inputs a, and outputs b t , then any initial segment 
x CT (i),..., with the same canalized output b\ = ■ ■ ■ = bj can be permuted to yield an equivalent form 
as in Eq. ©. 

Definition 3.4. If f(x i,..., x n ) is fc-canalizing with respect to a £ ©„, inputs a, and outputs bi, then for 
each j < fc, define the Boolean function gj ,..., x a ^ n )) to be the result of plugging in x a ^ = a? 

for * = 1 


In plain English, the function gj is the result of when the first j canalizing variables do not get their 
canalizing inputs. We can now show that the canalizing depth fc and the core function fc are independent 
of the order of the variables. Moreover, the ambiguity of variable orderings is well-controlled in that they 
are partitioned into blocks called layers via extended monomials, and variables can be permuted arbitrarily 
if and only if they lie in the same layer. This generalizes the observation in Remark [3T3l 


Proposition 3.5. Suppose an n-variable Boolean function f is k-canalizing with respect to the permutation 
a, inputs at and outputs bi, for 1 < i < fc, and fc' -canalizing with respect to the permutation a’, inputs a'j 
and outputs bj, for 1 < j < fc', such that both g and g', obtained by substituting affor x a ^ and a'j for 






STRATIFICATION AND ENUMERATION OF BOOLEAN FUNCTIONS BY CANALIZING DEPTH 


5 


2 V (j) respectively, are not canalizing. Then k = k! and the resulting core functions, if they exist, are the 
same. 


Proof. Assume / is canalizing, because otherwise, k = k' = 0 and the result is trivial. Without losing 
generality we can assume cr(l) cr'(l), since if this were not the case, we could simply input 7TT = a[ 
for a^i) = av(l) an d consider g° = g° ■ (Note that if er(l) = cr'(l) and a\ af then b\ ^ b[, which 
means that / is completely determined by the input to * (T (i) = xvn ) ■ In this case, f has only one essential 
variable, and so k = 1. Moreover, both gl and gf are constant functions. Thus / has no core function.) 

Since g is non-canalizing, it is not essential in x^), and thus ct( 1) = ct'(j*) for some 1 < j* < k' . We 
claim that we may assume without loss of generality that a'. = ai and 6', = b\. To see why, first suppose 
that a', = and consider the two possible inputs to = x a (i) i n the function If this variable 

takes its canalizing input (Tf, then the output is 6',. However, since / is canalizing in = 2V(i)> then 

the other input ai would yield the output b\. In other words, gj, _ ( is completely determined by the input 
to 2 V(j*), so all subsequent variables are fictitious. Therefore, = g' must be constant, hence j* = k'. 
Moreover, this function must be g' = b\ because it only arises when Xo-'O*) = x cr(i) takes the canalizing 
input a i. Since / is essential in x CT /(j») = x CT (i), then Remark COl imnlies that 6', = b\, the opposite value 
of g' = bi. Thus, we have two equivalent ways to represent gj*_i = g k *_ f. 


(3) 


9k' 


b± (fc/) ai, 

g' = bi Xa'(k') = a l- 



*^'O r/ (/c / ) CL\. 


In other words, switching the triple of values (a' k ,,b' k ,,g') from (ai,&i,£»i) to (ai,6i,6i) in the original 
representation of / with respect to a' £ &„ does not change the function, so we may assume that a', = a\ 
and bj „ = b \, as claimed. The proof for the case when 6'-» = b \ is almost the same. 

Since / is canalizing in x a i(j ») = x CT (i) with input a± and output b\, we must also have fr' = b\ for all 
1 < j < j*■ By Remark [331 we can create a new permutation a" by swapping the order of x cr /( 1 ) and 
x CT '(j*) in a'. Clearly, / is fc'-canalizing with respect to a" and g k , = g k , . Since x CT (i) = x a "(iy the 
result follows from induction on g° = g° . We conclude that k = k'. 

Finally, we need to show that when / has a core function fc , it is unique. The non-canalizing functions 
g and <?' are essential in the same set of variables. If they are both constant functions, then they actually 
need not be the same, due to the different ways to write g' as in Eq. (0. Otherwise, they are core functions 
for /, and are obtained by substituting the same set inputs for the same set of variables, thus we must have 

fc = g = g'■ □ 


It is worth noting that Definition B.ll is similar to the definition of /.--partially nested canalizing functions 
(/c-PNCFs) in LDM 1 2tl . In fact, these two definitions hold the same motivation but are from different 
perspectives. In llLDM12fl . the authors treat fc-PNCFs as a subclass of Boolean functions. While we prefer 
to consider canalization as a property of Boolean functions and different functions have different extent of 
canalization. This provides us a well-defined way to classify all Boolean functions on n variables. 

Returning to our onion analogy, now we can think of all Boolean functions as onions. For each Boolean 
function, we can try to peel off its variables as we did for nested canalizing functions. We will have to stop 
once we get to a non-canalizing function. In this sense, nested canalizing functions would be the ‘best’ 
onions since we can peel off all the variables and non-canalizing would be the ‘worst’. The fc-canalizing 
functions would be those for which one can be peeled off at least k variables. Though a unique core function 
fc — g only exists when g is non-constant, we will soon see how every Boolean function, whether or not 
it has a core function, has a unique core polynomial that extends the notion of a core function. 


Example 3.6. The Boolean function /(x, y, z, w) = xy(z + w) has canalizing depth 2 and core function 

fc = z + w. 


Remark 3.7. In our framework, if we consider the set of all Boolean functions on n variables, then: 

■ The canalizing depth of a A:-canalizing function is at least k. 

■ A non-canalizing function has canalizing depth 0, and if it is non-constant, then its core function 
is itself. 
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■ Every Boolean function is O-canalizing. 

■ The 1-canalizing functions are precisely the canalizing functions. 

■ The n -canalizing functions are precisely the nested canalizing functions. 

■ If a function / has canalizing depth k and the resulting g is constant, then / has n — k fictitious 
variables, and it is a nested canalizing function on its k essential variables. 


4. Characterizations of /c-Canalizing Functions 

4.1. Polynomial Form of / -Canalizing Functions. It is well-known I LNC96tl that any Boolean function 
/ can be uniquely expressed as a square-free polynomial, called its algebraic normal form. Equivalently, 
the set of Boolean functions on n variables is isomorphic to the quotient ring R := F 2 [x %,..., a;„]//, 
where I = (xf x t : 1 < i < n). Henceforth in this section, when we speak of Boolean polynomials, 
we assume they are square-free. Additionally, we will define Xi := (xi, ... , x,:_i, x,;_i -i,... ,x n ) for 
notational convenience. In this section, we will extend work on NCFs from [LAM ^ 13] to general k- 
canalizing Boolean functions. 


Lemma 4.1. A Boolean function f(x i, ..., x n ) is canalizing in variable Xi, for some 1 < i < n, with 
input a,i and output bi, if and only if 

f = (Xj + ai)g(xi) + bi, 

for some polynomial g ^ 0. 


Proof. Suppose / is canalizing in x, . Write / in its algebraic normal and factor it as 

/ = Xi q(xi) + r(sti ), 

where q and r are the quotient and remainder of / when divided by x^. Note that bi = aiqipti) + r(x,), 
and since cu + cu = 0 in F 2 , 

/ = (xj + ai)q(xi) + [r(xj) + a z q(xi)\ = (x* + aj)g(xj) + bi. 

The function g(£i) := q{xf) is nonzero because / is essential in Xj. This establishes necessity, and 
sufficiency is obvious. □ 


By applying the above lemma recursively, we get the following theorem. 

Theorem 4.2. A Boolean function f(x i,... ,x n ) is k-canalizing, with respect to permutation o £ & r 
inputs at and outputs bi, for 1 < i < k, if and only if it has the polynomial form 


/(x i, • • ■ ,x n ) = (x ct(1) + a 1 )g{x i ) + h 


where 


+ A&i 


g(xi) = (x ff ( 2 ) + a 2 ) ... [(x (7 (fc_i) + a fe _i)[(x cr ( fe ) + a k )g + Ab k -i] + A6 fc _ 2 ] ... 

for some polynomial g = g(x a (k+i ), • • •, x (T ( n )) ^ 0, where A bi := bi + \ — bi = bi+\ + bi. □ 

4.2. Dominance Layers of Boolean Functions. One weakness of Theorem 14.21 is that given a Boolean 
function /, the representation of / into the above form, even when k is exactly the canalizing depth, is 
not unique. In a fc-canalizing function, some variables are “more dominant” than others. We will clas¬ 
sify all variables of a Boolean function into different layers according to the extent of their dominance, 
extending work from I LAM + 13|) from NCFs to general Boolean functions. The “most dominant” vari¬ 
ables will be precisely those that are canalizing. Recall that we are always working in the quotient ring 
R = F 2 [xi, ..., x n \/I, though at times it is helpful to consider the algebraic normal form of a polynomial 
as an element of F 2 [x \,..., x n ]. 

Definition 4.3. A Boolean function M(x i,..., x m ) is an extended monomial in variables xi,..., x m if 

m 

M (#i, . . . , — j | (pd &i) i 

2=1 


where G F 2 for each i = 1,..., m. 













STRATIFICATION AND ENUMERATION OF BOOLEAN FUNCTIONS BY CANALIZING DEPTH 


7 


An extended monomial in R is an extended monomial of a subset of {.x'i ,,x n }. In other words, it is 
simply a product ]~[" = i where each y, is either x t , ly, or 1. Using extended monomials, we can refine 
Theorem l4.2l to obtain a unique extended monomial form of any Boolean function. 

Proposition 4.4. Given a Boolean function f(x ±,..., x n ), all variables are canalizing if and only if f = 
M ( x \,..., x n ) + b, where M is an extended monomial in all variables. 

Proof Suppose all n variables are canalizing if /, and so / is essential in every variable. Since x\ is 
canalizing. Lemma |4~TI says that / = (aq + ai)g(xi) + b for some a±, b £ F 2 , and g ^ 0. In particular, 
this means that (aq + af) | (/ + b) in F 2 [aq,..., x n \. Since x 2 is also canalizing, f(x-i, a 2 ,..., x n ) = b' 
for some a 2 and b'. Plugging in x\ = a 1 yields /(ai, a 2 , X 3 ,..., x n ) = b = b', and so 

(x 2 + a 2 ) \ {f + b) = (xi + ai)g{x 2 , ...,x n ). 

Since x\ + ai and x 2 + a 2 are co-prime, we get (x 2 + a 2 ) | g(x 2 , ■ ■ ■ ,x n ). Note that g(x 1 ) ^ 0, 
hence, we have g(x 1 ) = (x 2 + a 2 )(/(x 3,..., x n ) where g'(x 3,..., x n ) 0. Thus we have / = (x\ + 
ai)(x 2 + a 2 )g'{x 3 ,..., x n ) + b. Necessity of the proposition now follows from induction, and sufficiency 
is obvious. □ 


We are now ready to prove the main result of this section. This is a generalized version of Theorem 4.2 
in | LAM 1 13]. We will obtain a new extended monomial form of a Boolean function / by induction. In 
this form, all variables will be classified into different layers according to their dominance. The canalizing 
variables are the most dominant variables. Thus, a Boolean function may have one, none, or many “most 
dominant” variables. As in |LAM^13[], variables in the same layer will have the same level of dominance, 
with the variables in the outer layers being “more dominant” than those in the inner layers. 


Theorem 4.5. Every Boolean function f{x\, ..., x n ) ^ 0 can be uniquely written as 
(4) f(x 1 ,... , x n ) = Mi(M 2 (- • • (M r _i(M r pc + 1) + 1) • • •) + 1) + b, 

where each Mi = rij=i( ;c i 3 + a ij ) a nonconstant extended monomial, pc ^ 0 is the core polynomial 
of f, and k = is the canalizing depth. Each Xi appears in exactly one of {Mi,..., M r ,pc}, and the 
only restrictions on Eq. © are the following “exceptional cases": 

(i) If pc = 1 a,l d r 1, then k r > 2; 

(ii) If Pc = 1 an( I f = 1 and ki = 1, then b = 0; 

When f is a non-canalizing function, we simply have pc = /. 

Before we prove Theorem l4.5l we will define some terms and examine a few details, such as the subtle 
difference between the core function and core polynomial, and the “exceptional cases”, by simple exam¬ 
ples. This should help elucidate the more technical parts of the proof. 

Definition 4.6. A Boolean function / written in its unique form from Eq. © is said to be in standard 
monomial form, and r is its layer number. The i th dominance layer of /, denoted Lj, is the set of essential 
variables of Mi. The set of essential variables of pc is denoted L^, and these are called the recessive 
variables of /. 


As we will see, when / has a core function fc, its core polynomial is either pc = fc or Pc = fc + I • 
When the number of “+l”s that appear in Eq. ©, possibly including b, is even, we have pc = fc- 
Otherwise, we have pc = fc + 1- When a Boolean function / with canalizing depth k > 0 fails to have 
a core function, i.e., the remaining function is either g = 0 or g = 1, then / is in fact a nested canalizing 
function on k variables, and its core polynomial is simply pc = 1. 

Finally, we will examine the two “exceptional cases”. Both of these are necessary to avoid double¬ 
counting certain functions and ensure uniqueness, as claimed in Theorem 14. 5 1 
(i) If Pc = 1 au d r / 1. In this case, if k r = 1, that is M r = Xi or xf, for some i. In either case, this 
innermost layer can be “absorbed” into the extended monomial M r _ 1 . For example, if M r = x t , 
then the inner two layers are 

fer-l 

M r _i {M r + 1) + 1 = M r -i(xi + 1) + 1 = (Xi + 1) ( Xi j + ai^ + 1, = M r -1 + 1, 

j'=i 
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where = xfM r _i is an extended monomial. Thus, in this case we may assume that the 

innermost layer has at least two essential variables, hence k r > 2. 

(ii) If pc = 1 and r = 1 and k\ = 1, then for some i, either / = Xi + 6, or / = ~xi + 6. Clearly, there 
are only two such functions, either / = Xi or / = xi, and so allowing both b = 0 and b = 1 would 
double-count these. Thus, we may assume that 6 = 0. 

Proof of Tlieorem \4.5\ For any non-canalizing function / ^ 0, / = pc and the uniqueness is obvious. 

When / is canalizing, we induct on n. When n = 1, there are 2 canalizing functions, namely x = (x) 1 
and x + 1 = (x + 1)1, both satisfying Eq. ©. For the these 2 functions, since pc = 1, r = 1 and k-\ = 1, 
we must have 6 = 0, so the previous representation is also unique. 

When n = 2, there are 12 canalizing functions, 4 of which are essential in 1 variable, and thus can 
be uniquely written as in Eq. <[4]». Now let us consider the 8 canalizing functions that are essential in 2 
variables. It is easy to check for all these, both variables x± and x 2 are canalizing. Then by Pror>osition l4.4[ 
all of them are of the form 

{xi + a\)(x 2 + a 2 ) + b = M x pc + 6, 

where Mi = ( X\ + ai)(a ;2 + 02 ) and pc = 1. In this case, we have r = 1 and fci = 2. Note that when 
Pc = 1, the innermost layer must have at least two essential variables, so uniqueness holds. We have 
proved that Eq. © holds for n = 1 and n= 2. 

Assume now that Eq. © is true for any canalizing function that is essential in at most n — 1 variables. 
Consider a canalizing function f(x 1 ,..., x n ). Suppose that x\ j for each j = 1,..., k\ are all canalizing in 
/. With the same argument as in Proposition ^. 41 we get / = Mig + b , where Mi = ( X \ 1 -ptiij) • • • (xi ki + 
a. \ h ) and g ^ 0. If g is non-canalizing, then Eq. © holds with pc = g and r = 1. If g is canalizing, then 
it is a canalizing function that is essential in at most n — k 1 < n — 1 variables. By our induction hypothesis, 
it can be uniquely written as 

g = ■ ■ (M r _i(M r pc + 1) + 1) • • •) + 1) + b'. 

Note that b' must be 1, otherwise all variables in M 2 will also be most dominant variables of /. This 
completes the proof. □ 


Remark 4.7. For any Boolean function /: 

(i) Variables in two consecutive layers have different canalized outputs. 

(ii) L\ consists of all the most dominant variables (canalizing variables) of /. 


Let us return to our onion analogy, where previously we were peeling off one variable at a time. Fur¬ 
thermore, imagine that each individual variable layer is white if the canalized output a, = 0 , and black if 
a t = 1. Thus, we can think of an extended monomial layer L, as a maximal block of variable layers of the 
same color. We can “peel off” an entire Li at once by plugging in the non-canalizing input x, !;i = 7ij~ for 
each variable in Li. In other words, we can peel off all black layers, then all white layers, then all black 
layers, and so on. Moreover, we can read off the colors directly off of the function if it is written in the 
form of Eq. ©. However, recall that this form of a fc-canalizing function, where g is non-canalizing, is 
not unique. By Theorem 14.51 the order of consecutive variables, x a ( /t) and x fT (,:+ ij, can be transposed if 
and only if they are in the same Lj. Based on this property, we can enumerate Boolean functions on n 
variables with canalizing depth k. Roughly speaking, we will do this by counting the number of different 
layer structures, and then counting the number of (non-canalizing) core polynomials. This last set is j ust 
the complement of the set of canalizing functions on those variables, which were enumerated in [ JSK Q4I1. 


Example 4.8. The Boolean function f{x 1 ,..., X 7 ) = X 1 X 2 (x^x^xs + xq + X 7 + 1) + 1) has canalizing 
depth 4. With respect to the permutation a = 1,2,3,4, its canalizing inputs are (ai )^ =1 = (0,1, 0,0), 
outputs ( 6 , )f =1 = (0,0,1,1) and the core polynomial is pc = £5 + Xq + X 7 . 


5. Enumeration of /c-canalizing functions 


Let B(n, k ) be the number of Boolean functions on n variables with canalizing depth exactly k. Ex¬ 
act formulas are known for B(n, k) in a few special cases. The number of nested canalizing functions is 
B(n , n ). A recurrence for this was independently derived in the 1970s by engineers studying unate cas¬ 


cade functions I BB78 . SK79 1. and then a closed formula was found by mathematicians studying NCFs 
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I LAM + 13h . The quantity /i(4. k) was recently computed in |RDC14]. In this section, we will present a 
general formula for B(n , k). 

Theorem 14.51 indicates that we can construct a Boolean functions with canalizing depth k by adding 
extended monomial layers to a non-canalizing function on n — k variables. Moreover, the complement of 
the set of non-canalizing functions are the canalizing functions. Hence, let us begin with a formula for C n , 
the number of canalizing functions on n variables. This result was derived in [JSK04] using a probabilistic 
method. We will include an alternative combinatorial proof using the truth table of a Boolean function /. 
This is the length-2 n vector (/(x,))i, given some fixed ordering of the elements of Fj. 

Lemma 5.1. The number C n of canalizing Boolean functions on n > 0 variables is 


C n = 2((—1)” - n - 1) + J2(~l ) k+1 (?) 2 k+1 2 

fc=i ' ' 


Proof We wish to count the number of Boolean functions that are canalizing in at least 1 variable. We can 
construct a truth table of a Boolean function that is canalizing in at least k variables by doing the following. 
First, pick k variables to be canalizing; there are (£) ways to do this. Next, pick the canalizing input for 
each canalizing variable; there are 2 k ways to do that. Then, fill out the entries in the truth table of these 
canalizing inputs with the same canalized output; there are 2 ways to do that. The remaining table has 2 n ~ k 
entries, so there are 2 2 " — 1 ways to fill it out such that the corresponding function is non-constant. By 

inclusion-exclusion, we have ]Cfc=i( — l) fe+1 (k)2 fe+1 (2 2 " — 1). Note that in this process, there are 2 n 

functions of the form x, + a^, each being counted exactly twice, since we can pick either input as canalizing 
input. Therefore we have 

Cn = X^(-l) fe+1 Cf) 2 fc+1 (2 2 "" fc - 1) - 2n = 2((—l) n - n - 1) + ^(-l) fe+1 Cf\ 2 fc+1 2 2 ’ > “' ! . 

fc=l ' ' k= 1 ' ' 

□ 


As examples, one can check that C n = 0, 2,12,118, 3512,... for n = 0,1, 2, 3,4,.... This is consis¬ 
tent with the results in IJSK04jl , though it should be noted that all numbers differ by 2 because we do not 
consider the constant functions to be canalizing. 

Recall that there are 2 2 Boolean functions on n variables. Since the non-canalizing functions are the 
complement of the set of canalizing functions, the following is immediate. 


Corollary 5.2. The number B*(n , 0) of non-constant core polynomials on n variables is 
B*(n, 0) = B{n, 0) - 2 = (2 2 " - C n ) - 2 = 2 2 " - 2((-l) n - n) + ]T(-l) fc Q 2 


k+l2‘^ rl ~ k 


One can check that B*(n, 0) = 0,0, 2,136, 62022,..., for n = 0,1,2,3,4,.... 

Before we derive the general formula for B(n , k), let us first look at the special case when k = n. This 
was computed in liLAM f ~i~3tl . but we include a self-contained proof. Recall that a composition of n is a 
sequence k\,..., k r of non-empty integers such that hi + ■■■ + k r = n. By Theorem 14.51 the standard 
extended monomial form of a Boolean function with canalizing depth k involves a size-r composition of k 
with the additional property that k r >2. 


Lemma 5.3. For n > 2, the number B(n, n) of nested canalizing functions on n variables is given by: 

n— 1 

( 5 ) B(n,n) = 2 n+ 1 J 2 E 

r=1 ki-\-...-\-k r =n 
ki> 1, k r > 2 

where ( fci) k J = kl \ k2 \___ kr ]- 

Proof If a Boolean function is nested canalizing on n variables, then by Theorem l4.51 we know its core 
polynomial must be pc = 1. Let us first fix the layer number r. Then for each choice of ki ,..., k r , with 
k\ + ... + k r = n, ki > 1 and k r > 2, there are ( fc n k ) different ways to assign n variables to these r 
layers. For each variable x 3 , we can pick either x :i or x :] + 1 to be in its corresponding extended monomial. 
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Note that we also have 2 choices for b. So the number of nested canalizing functions on n variables with 
exactly r layers is given by: 

2 n+l 

ki~\-...-\-k r =n 
ki> 1, k r >2 

Then by summing over all possible layer numbers r, for 1 < r < n — 1, we get the formula in Eq. (0 for 

8 (n,ro). □ 

According to our definition, .8(1,1) = 2. One also can check that 8(2,2) = 8, 8(3,3) = 64, 
8(4,4) =736,.... 

Now we are ready to derive the general formula for B(n, k). 



Theorem 5.4. The number B(n,k) of Boolean functions on n variables with canalizing depth k, for 
1 < k < n, is 


B(n, k ) = 


B(k, k) + 8*(n — k, 0) • 2 fc+1 ^ 


k \, - ■ ■, k 7 


where the sum is taken over all compositions ofk, and the closed from of B(k, k) is given by Lemma [PI 


Proof. We can construct a Boolean function / on n variables with canalizing depth k by doing the follow¬ 
ing. First, pick k variables that are not in the core polynomial pc- There are (?) different ways to do that. 
Once we fixed the variables that are not in pc , we need to consider the following two cases: 

Case 1: pc = 1. Then / is actually a nested canalizing function on these k variables. There are B(k. k ) 
of them in total. 

Case 2: pc ^ 1. Thenpc is a non-constant core polynomial on n—k variables, so there are B*(n—k, 0) 
different choices for pc- Using the same argument as in Lemma l531 there are 



different ways for those k variables to form the extended monomials in Equation 0, where the sum is 
taken over all compositions of k. Note that we also have 2 ways to pick b. Therefore, in this case, there are 


B*(n-k, 0) • 2 k+1 J2 



different Boolean functions. 

By combining the above two cases, we get the formula for 8(n, k). 


□ 


Example 5.5. As previously mentioned, the quantities 8(4, k) for k = 0.... , 4 were computed in [RDC14]. 
It is easy to check that these values are consistent with our general formula. There are 2 2 = 65536 Boolean 
functions on 4 variables. The number of functions with canalizing depth exactly k. for k = 1, 2, 3,4 is 


8(4,4) 

8(4,3) 

8(4,2) 

8(4,1) 



(736 + 0) = 736 
(64 + 0) = 256 


(8 + 2 • 8 • 3) = 336. 


(2 + 136-4-1) = 2184. 


Summing these yields the total number of canalizing functions on 4 variables. 


C A = 3512 = 736 + 256 + 336 + 2184 = 8(4,4) + 8(4,3) + 8(4,2) + 8(4,1). 

Thus, there are 8(4, 0) = 65536 — 3512 = 62024 non-canalizing functions on four variables, including 
the two constant functions. 


Note that fc-canalizing functions are simply Boolean functions with depth at least k, therefore we im¬ 
mediately get the following equality. 
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Corollary 5.6. The number of k^-canalizing Boolean functions on n variables , 1 < k$ < n, is given by: 


B(k, k) + B*(n — k, 0) ■ 2 fc+1 ^ 


H B(n,k)= J2 U 

k=ko k=ko ^ 

In particular, the canalizing functions are counted by the following identity: 


k\ ,... ? k? 


C n = J2 B (n, k) = J2( k ) B ^ k ) + B*(n- k, 0) • 2 fc+1 £ 

k= 1 fc=l ' ' - 

In both equations, the last sum is taken over all compositions ofk. 


k 

k \,- • •; kj 


6. Concluding remarks and future work 

Canalizing Boolean functions were inspired by structural and dynamic features of biological networks. 
In this article, we extended results on NCFs and derived a unique extended monomial form of arbitrary 
Boolean functions. This gave us a stratification of the set of n-variable Boolean functions by canalizing 
depth. In particular, this form encapsulates three invariants of Boolean functions: canalizing depth, dom¬ 
inance layer number and the non-canalizing core polynomial. By combining these three invariants, we 
obtained an explicit formula for the number of Boolean functions on n variables with depth k. We also 
introduced the notion of fc-canalizing Boolean functions, which we believe to be a promising framework 
for modeling gene regulatory networks. Our stratification yielded closed formulas for the number of n- 
variable Boolean functions of canalizing depth Perhaps more valuable than the exact enumerations is 
the fact that now it is straightforward to derive asymptotics for the number of such functions as n and k 
grow large. 

In future work, we will investigate well-known Boolean network models and compute the canalizing 
depth of the proposed functions. We are working on reverse-engineering algorithms that construct Boolean 
network models from partial data. In particular, how can one find the function with the maximum canalizing 
depth that fits that data, and whether the set of k -canalizing functions in the model space has an inherent 
algebraic structure. Progress has been made on these problems for general Boolean functions without 
paying attention to canalizing depth, and for NCFs. For example, for the general reverse-engineering 
problem, the set of feasible functions (i.e., the “model space”) is a coset f + I in the polynomial ring 
F 2 [a:i,... ,x n ], where I is the ideal of functions that vanish on the data-set; see [HJ12]. Can we get 
more refined results by restriction to fc-canalizing functions? The set of nested canalizing functions can be 
parametrized by a union of toric algebraic varieties [JL07]. It is relatively straightforward to show that the 
set of fc-canalizing functions admits a similar parametrization, but it is not clear whether this has any actual 
utility for modeling. 

Another avenue of current research extends the work in the electrical engineering community on the 
unate cascade functions. Recall that these are precisely the NCFs, and they are precisely the functions 
whose binary decision diagrams have minimum average path length, and this can be explicitly computed. 
Similarly, we can compute the minimum average path length of a binary decision diagram of a fc-canalizing 
function. 

Finally, much of the work in this paper should be able to be extended to multi-state (rather than Boolean) 
functions. As long as K is a finite field, then n-variable functions over K are polynomials in the ring 
K[x\,, x n ]. The definition of an NCF was extended from Boolean to multi-state functions in I1ML1 21. 
where the authors also enumerated these functions. Some of the proof techniques in this current paper 
specifically use the fact that K = F 2 , and it is not clear how well they would extend to general finite 
fields. However, there should absolutely be a stratification of multi-state functions by canalizing depth. 
The problem of enumerating fc-canalizing multi-state functions seems to be challenging but still within 
reach. 
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